Impala: A Modern, Open-Source SQL Engine for Hadoop

نویسندگان

  • Marcel Kornacker
  • Alexander Behm
  • Victor Bittorf
  • Taras Bobrovytsky
  • Casey Ching
  • Alan Choi
  • Justin Erickson
  • Martin Grund
  • Daniel Hecht
  • Matthew Jacobs
  • Ishaan Joshi
  • Lenni Kuff
  • Dileep Kumar
  • Alex Leblang
  • Nong Li
  • Ippokratis Pandis
  • Henry Robinson
  • David Rorke
  • Silvius Rus
  • John Russell
  • Dimitris Tsirogiannis
  • Skye Wanderman-Milne
  • Michael Yoder
چکیده

Cloudera Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. Impala provides low latency and high concurrency for BI/analytic read-mostly queries on Hadoop, not delivered by batch frameworks such as Apache Hive. This paper presents Impala from a user’s perspective, gives an overview of its architecture and main components and briefly demonstrates its superior performance compared against other popular SQL-on-Hadoop systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ISP: Large-Scale In-memory Spatial Data Processing System (Demo Paper)

Huge amount of spatial data such as GPS locations is being generated everyday, which brings big challenges of efficient spatial data processing. Many existing big spatial data processing techniques are mostly based on disk-resident systems. They have not fully taken advantages of modern hardware, such as large main memory capacities and multi-core processors. In this paper, we demonstrate our I...

متن کامل

Runtime Code Generation in Cloudera Impala

In this paper we discuss how runtime code generation can be used in SQL engines to achieve better query execution times. Code generation allows query-specific information known only at runtime, such as column types and expression operators, to be used in performance-critical functions as if they were available at compile time, yielding more efficient implementations. We present Cloudera Impala,...

متن کامل

SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures

SQL query processing for analytics over Hadoop data has recently gained significant traction. Among many systems providing some SQL support over Hadoop, Hive is the first native Hadoop system that uses an underlying framework such as MapReduce or Tez to process SQL-like statements. Impala, on the other hand, represents the new emerging class of SQL-on-Hadoop systems that exploit a shared-nothin...

متن کامل

VerdictDB: Universalizing Approximate Query Processing

Despite 25 years of research in academia, approximate query processing (AQP) has had little industrial adoption. One of the major causes for this slow adoption is the reluctance of traditional vendors to make radical changes to their legacy codebases, and the preoccupation of newer vendors (e.g., SQL-on-Hadoop products) with implementing standard features. On the other hand, the few AQP engines...

متن کامل

RSECM: Robust Search Engine using Context-based Mining for Educational Big Data

With an accelerating growth in the educational sector along with the aid of ICT and cloud-based services, there is a consistent rise of educational big data, where storage and processing become the prime matter of challenge. Although many recent attempts have used open source framework e.g. Hadoop for storage, still there are reported issues in sufficient security management and data analyzing ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015